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Abstract. Association rules are among the most widely employed data analysis methods 
in the field of Data Mining. An association rule is a form of partial implication between two 
sets of binary variables. In the most common approach, association rules are parametrized 
by a lower bound on their confidence, which is the empirical conditional probability of 
their consequent given the antecedent, and/or by some other parameter bounds such as 
"support" or deviation from independence. We study here notions of redundancy among 
association rules from a fundamental perspective. We see each transaction in a dataset as 
an interpretation (or model) in the propositional logic sense, and consider existing notions 
of redundancy, that is, of logical entailment, among association rules, of the form "any 
dataset in which this first rule holds must obey also that second rule, therefore the second 
is redundant" . We discuss several existing alternative definitions of redundancy between 
association rules and provide new characterizations and relationships among them. We 
show that the main alternatives we discuss correspond actually to just two variants, which 
differ in the treatment of full-confidence implications. For each of these two notions of 
redundancy, we provide a sound and complete deduction calculus, and we show how to 
construct complete bases (that is, axiomatizations) of absolutely minimum size in terms of 
the number of rules. We explore finally an approach to redundancy with respect to several 
association rules, and fully characterize its simplest case of two partial premises. 



The relatively recent discipline of Data Mining involves a wide spectrum of techniques, 
inherited from different origins such as Statistics, Databases, or Machine Learning. Among 
them, Association Rule Mining is a prominent conceptual tool and, possibly, a cornerstone 
notion of the field, if there is one. Currently, the amount of available knowledge regarding 
association rules has grown to the extent that the tasks of creating complete surveys and 
websites that maintain pointers to related literature become daunting. A survey, with plenty 
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of references, is [12], and additional materials are available in [25J; see also [2], [3], [18], [36] . 
[44] . [45] . and the references and discussions in their introductory sections. 

Given an agreed general set of "items", association rules are defined with respect to a 
dataset that consists of "transactions" , each of which is, essentially, a set of items. Associ- 
ation rules are customarily written as X — > Y, for sets of items X and Y, and they hold in 
the given dataset with a specific "confidence" quantifying how often Y appears among the 
transactions in which X appears. 

A close relative of the notion of association rule, namely, that of exact implication in 
the standard propositional logic framework, or, equivalently, association rule that holds 
in 100% of the cases, has been studied in several guises. Exact implications are equiva- 
lent to conjunctions of definite Horn clauses: the fact, well-known in logic and knowledge 
representation, that Horn theories are exactly those closed under bitwise intersection of 
propositional models leads to a strong connection with Closure Spaces, which are charac- 
terized by closure under intersection (see the discussions in [15] or [26]). Implications are 
also very closely related to functional dependencies in databases. Indeed, implications, as 
well as functional dependencies, enjoy analogous, clear, robust, hardly disputable notions 
of redundancy that can be defined equivalently both in semantic terms and through the 
same syntactic calculus. Specifically, for the semantic notion of entailment, an implication 
X — > Y is entailed from a set of implications 1Z if every dataset in which all the implications 
of 1Z hold must also satisfy X — > Y; and, syntactically, it is known that this happens if and 
only if X — > Y is derivable from 1Z via the Armstrong axiom schemes, namely, Reflexivity 
(X -> Y for Y C X), Augmentation (if X ->■ Y and X' -> Y' then XX' -> YY' , where 
juxtaposition denotes union) and Transitivity (if X — > Y and Y — > Z then X — > Z). 

Also, such studies have provided a number of ways to find implications (or functional 
dependencies) that hold in a given dataset, and to construct small subsets of a large set 
of implications, or of functional dependencies, from which the whole set can be derived; in 
Closure Spaces and in Data Mining these small sets are usually called "bases", whereas in 
Dependency Theory they are called "covers", and they are closely related to deep topics 
such as hypergraph theory. Associated natural notions of minimality (when no implication 
can be removed), minimum size, and canonicity of a cover or basis do exist; again it is 
inappropriate to try to give a complete set of references here, but see, for instance, [T5] . 
PI], n [23], |21|, [26], [37], 03], [15], and the references therein. 

However, the fact has been long acknowledged (e.g. already in |33j) that, often, it is 
inappropriate to search only for absolute implications in the analysis of real world datasets. 
Partial rules are defined in relation to their "confidence": for a given rule X — > Y, the ratio 
of how often X and Y are seen together to how often X is seen. Many other alternative 
measures of intensity of implication exist [20], [21]; we keep our focus on confidence because, 
besides being among the most common ones, it has a natural interpretation for educated 
users through its correspondence with the observed conditional probability. 

The idea of restricting the exploration for association rules to frequent itemsets, with 
respect to a support threshold, gave rise to the most widely discussed and applied algo- 
rithm, called Apriori [3], and to an intense research activity. Already with full-confidence 
implications, the output of an association mining process often consists of large sets of rules, 
and a well-known difficulty in applied association rule mining lies in that, on large datasets, 
and for sensible settings of the confidence and support thresholds and other parameters, 
huge amounts of association rules are often obtained. Therefore, besides the interesting 
progress in the topic of how to organize and query the rules discovered (see [H], [32], [4"2"]). 
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one research topic that has been worthy of attention is the identification of patterns that 
indicate redundancy of rules, and ways to avoid that redundancy; and each proposed notion 
of redundancy opens up a major research problem, namely, to provide a general method for 
constructing bases of minimum size with respect to that notion of redundancy. 

For partial rules, the Armstrong schemes are not valid anymore. Reflexivity does hold, 
but Transitivity takes a different form that affects the confidence of the rules: if the rule 
A —> B (or A — > AB, which is equivalent) and the rule B — > C both hold with confidence 
at least 7, we still know nothing about the confidence of A — > C; even the fact that both 
A — > AB and AB — > C hold with confidence at least 7 only gives us a confidence lower 
bound of 7 2 < 7 for A — > C (assuming 7 < 1). Augmentation does not hold at all; indeed, 
enlarging the antecedent of a rule of confidence at least 7 may give a rule with much smaller 
confidence, even zero: think of a case where most of the times X appears it comes with Z, 
but it only comes with Y when Z is not present; then the confidence of X — > Z may be high 
whereas the confidence of XY — > Z may be null. Similarly, if the confidence of X — > YZ is 
high, it means that Y and Z appear together in most of the transactions having X, whence 
the confidences of X — > Y and X — > Z are also high; but, with respect to the converse, 
the fact that both Y and Z appear in fractions at least 7 of the transactions having X 
does not inform us that they show up together at a similar ratio of these transactions: 
only a ratio of 27 — 1 < 7 is guaranteed as a lower bound. In fact, if we look only for 
association rules with singletons as consequents (as in some of the analyses in [1], or in the 
"basic association rules" of [30] , or even in the traditional approach to association rules [2j 
and the useful apriori implementation of Borgelt available on the web [8]) we are almost 
certain to lose information. As a consequence of these failures of the Armstrong schemes, 
the canonical and minimum-size cover construction methods available for implications or 
functional dependencies are not appropriate for partial association rules. 

On the semantic side, a number of formalizations of the intuition of redundancy among 
association rules exist in the literature, often with proposals for defining irredundant bases 
(see [I], [I3|, [27], [33], [36], [38], 03], the survey [29], and section 6 of the survey |H]). 
All of these are weaker than the notion that we would consider natural by comparison with 
implications (of which we start the study in the last section of this paper) . We observe here 
that one may wish to fulfill two different roles with a basis, and that both appear (somewhat 
mixed) in the literature: as a computer-supported data structure from which confidences 
and supports of rules are computed (a role for which we use the closures lattice instead) or, 
in our choice, as a means of providing the user with a smallish set of association rules for 
examination and, if convenient, posterior enumeration of the rules that follow from each rule 
in the basis. That is, we will not assume to have available, nor to wish to compute, exact 
values for the confidence, but only discern whether it stays above a certain user-defined 
threshold. We compute actual confidences out of the closure lattice only at the time of 
writing out rules for the user. 

This paper focuses mainly on several such notions of redundancy, defined in a rather 
general way, by resorting to confidence and support inequalities: essentially, a rule is redun- 
dant with respect to another if it has at least the same confidence and support of the latter 
for every dataset. We also discuss variants of this proposal and other existing definitions 
given in set-theoretic terms. For the most basic notion of redundancy, we provide formal 
proofs of the so far unstated equivalence among several published proposals, including a 
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syntactic calculus and a formal proof of the fact, also previously unknown, that the ex- 
isting basis known as the Essential Rules or the Representative Rules (p], [27], [38]) is of 
absolutely minimum size. 

It is natural to wish further progress in reducing the size of the basis. Our theorems 
indicate that, in order to reduce further the size without losing information, more powerful 
notions or redundancy must be deployed. We consider for this role the proposal of handling 
separately, to a given extent, full- confidence implications from lower-than-l-confidence rules, 
in order to profit from their very different combinatorics. This separation is present in many 
constructions of bases for association rules [33], [36], [Hj. We discuss corresponding notions 
of redundancy and completeness, and prove new properties of these notions; we give a 
sound and complete deductive calculus for this redundancy; and we refine the existing 
basis constructions up to a point where we can prove again that we attain the limit of the 
redundancy notion. 

Next, we discuss yet another potential for strengthening the notion of redundancy. So 
far, all the notions have just related one partial rule to another, possibly in the presence of 
full implications. Is it possible to combine two partial rules, of confidence at least 7, and 
still obtain a partial rule obeying that confidence level? Whereas the intuition is that these 
confidences will combine together to yield a confidence lower than 7, we prove that there is 
a specific case where a rule of confidence at least 7 is nontrivially entailed by two of them. 
We fully characterize this case and obtain from the caracterization yet another deduction 
scheme. We hope that further progress along the notion of a set of partial rules entailing a 
partial rule will be made along the coming years. 

Preliminary versions of the results in sections 3.1 4.2 4.3 , and[5]have been presented at 
Discovery Science 2008 [6j; preliminary versions of the remaining results (except those in 
section 4.5 which are newer and unpublished) have been presented at ECMLPKDD 2008 |5j. 



2. Preliminaries 

Our notation and terminology are quite standard in the Data Mining literature. All our 
developments take place in the presence of a "universe" set U of atomic elements called items; 
their absence or presence in sets or items plays the same role as binary-valued attributes 
of a relational table. Subsets of IA are called itemsets. A dataset T> is assumed to be given; 
it consists of transactions, each of which is an itemset labeled by a unique transaction 
identifier. The identifiers allow us to distinguish among transactions even if they share 
the same itemset. Upper-case, often subscripted letters from the end of the alphabet, like 
X\ or Yq, denote itemsets. Juxtaposition denotes union of itemsets, as in XY; and Z C X 
denotes proper subsets, whereas Z C X is used for the usual subset relationship with 
potential equality. 

For a transaction t, we denote t \= X the fact that X is a subset of the itemset 
corresponding to t, that is, the transaction satisfies the minterm corresponding to X in the 
propositional logic sense. 

From the given dataset we obtain a notion of support of an itemset: sx>(X) is the 
cardinality of the set of transactions that include it, {t € T> I t |= X}; sometimes, abusing 
language slightly, we also refer to that set of transactions itself as support. Whenever T> 
is clear, we drop the subindex: s(X). Observe that s(X) > s(Y) whenever X C Y; this 
is immediate from the definition. Note that many references resort to a normalized notion 
of support by dividing by the dataset size. We chose not to, but there is no essential issue 
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here. Often, research work in Data Mining assumes that a threshold on the support has been 
provided and that only sets whose support is above the threshold (then called "frequent") 
are to be considered. We will require this additional constraint occassionally for the sake 
of discussing the applicability of our developments. 

We immediately obtain by standard means (see, for instance, |19j or [33]) a notion of 
closed itemsets, namely, those that cannot be enlarged while maintaining the same support. 
The function that maps each itemset to the smallest closed set that contains it is known 
to be monotonic, extensive, and idempotent, that is, it is a closure operator. This notion 
will be reviewed in more detail later on. Closed sets whose support is above the support 
threshold, if given, are usually termed closed frequent sets. 

Association rules are pairs of itemsets, denoted as X — > Y for itemsets X and Y . 
Intuitively, they suggest the fact that Y occurs particularly often among the transactions in 
which X occurs. More precisely, each such rule has a confidence associated: the confidence 
cjy{X — > Y) of an association rule X — > Y in a dataset D is S g^x) • ^ s w ith support, often 
we drop the subindex T>. The support in T> of the association rule X — > Y is sx>(X —}Y) = 
sv(XY). 

We can switch rather freely between right-hand sides that include the left-hand side 
and right-hand sides that don't: 

Definition 2.1. Rules Xq — > Yq and X\ — > Y\ are equivalent by reflexivity if Xq = X\ and 
X Y = X y Y x . 

Clearly, c v {X Y) = c v (X -)• XY) = c v (X -> X'Y) and, likewise, s v {X -> Y) = 
sx>(X —> XY) = sx>(X — > X'Y) for any X' C X; that is, the support and confidence of 
rules that are equivalent by reflexivity always coincide. A minor notational issue that we 
must point out is that, in some references, the left-hand side of a rule is required to be 
a subset of the right-hand side, as in [33] or [38], whereas many others require the left- 
and right-hand sides of an association rule to be disjoint, such as [29] or the original [2]. 
Both the rules whose left-hand side is a subset of the right-hand side, and the rules that 
have disjoint sides, may act as canonical representatives for the rules equivalent to them by 
reflexivity. We state explicitly one version of this immediate fact for later reference: 

Proposition 2.2. If rules Xq — > Yq and X\ — > Y\ are equivalent by reflexivity, XqDYq = 0, 
and X\ n Yi = 0, then they are the same rule: Xq = X\ and Yq = Y\. 

In general, we do allow, along our development, rules where the left-hand side, or a part 
of it, appears also at the right-hand side, because by doing so we will be able to simplify the 
mathematical arguments. We will assume here that, at the time of printing out the rules 
found, that is, for user-oriented output, the items in the left-hand side are removed from 
the right-hand side; accordingly, we write our rules sometimes as X — > Y — X to recall this 
convention. 

Also, many references require the right-hand side of an association rule to be nonempty, 
or even both sides. However, empty sets can be handled with no difficulty and do give 
meaningful, albeit uninteresting, rules. A partial rule X — > with an empty right-hand 
side is equivalent by reflexivity to X — ¥ X, or to X — > X' for any X' C X, and all of these 
rules have always confidence 1. A partial rule with empty left-hand side, as employed, for 
instance, in |29| . actually gives the normalized support of the right-hand side as confidence 
value: 

Fact 2.3. In a dataset D of n transactions, c(0 — > Y) = s(Y)/n. 
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Again, these sorts of rules could be omitted from user-oriented output, but considering 
them conceptually valid simplifies the mathematical development. We also resort to the 
convention that, if s(X) = (which implies that s(XY) = as well) we redefine the unde- 
fined confidence c(X — >• Y) as 1, since the intuitive expression "all transactions having X 
do have also Y" becomes vacuously true. This convention is irrespective of whether Y ^ 0. 

Throughout the paper, "implications" are association rules of confidence 1, whereas 
" partial rules" are those having a confidence below 1. When the confidence could be 1 or 
could be less, we say simply "rule" . 



3. Redundancy Notions 

We start our analysis from one of the notions of redundancy defined formally in [T] . The 
notion is employed also, generally with no formal definition, in several papers on association 
rules, which subsequently formalize and study just some particular cases of redundancy 
(e.g. [27], [40J); thus, we have chosen to qualify this redundancy as "standard". We propose 
also a small variation, seemingly less restrictive; we have not found that variant explicitly 
defined in the literature, but it is quite natural. 

Definition 3.1. 

(1) pQ Xq —t- Yq has standard redundancy with respect to X\ -4 Y\ if the confidence and 
support of Xq — > Yq are larger than or equal to those of X\ — > Yi, in all datasets. 

(2) Xq — > Yq has plain redundancy with respect to Xi — > Y\ if the confidence of Xq — > Yq 
is larger than or equal to the confidence of X± — > Yi, in all datasets. 

Generally, we will be interested in applying these definitions only to rules Xq — > Yq where 
Yq Xq since, otherwise, c{Xq — > Yq) = 1 for all datasets and the rule is trivially redundant. 
We state and prove separately, for later use, the following new technical claim: 

Lemma 3.2. Assume that rule Xq — > Yq is plainly redundant with respect to rule X\ — > Y\, 
and that Yq^Xq. Then XqYq C XlYl. 

Proof. Assume -Xo^o 2 -^"l^l, to argue the contrapositive. Then, we can consider a dataset 
consisting of one transaction Xq and, say, m transactions X\Y\. No transaction includes 
XqYq, therefore c(Xq — > Yq) = 0; however, c(X\ — > Y\) is either 1 or m/(m + 1), which can 
be pushed up as much as desired by simply increasing m. Then, plain redundancy does not 
hold, because it requires c(Xq — > Yo) > c(X\ — > Y\) to hold for all datasets whereas, for 
this particular dataset, the inequality fails. □ 

The first use of this lemma is to show that plain redundancy is not, actually, weaker 
than standard redundancy. 

Theorem 3.3. Consider any two rules Xq — > Yq and X\ — > Y\ where Yq ^ Xq. Then 
Xq Yq has standard redundancy with respect to X\ —> Y\ if and only if Xq — > Yq has 
plain redundancy with respect to X\ — > Y± . 

Proof. Standard redundancy clearly implies plain redundancy by definition. Conversely, 
plain redundancy implies, first, c{Xq — > Yq) > c(X\ — >• Y\) by definition and, further, 



XqYq C XiYi by Lemma 3.2; this implies in turn s(X -4 Y ) = s(X Y Q ) > s{X x Yi) 



s(X\ -}Y{), for all datasets, and standard redundancy holds. □ 
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The reference [T] also provides two more direct definitions of redundancy: 
Definition 3.4. 

(1) if X\ C Xq and XqYq = X\Y\, rule Xq — > Yq is simply redundant with respect to 

(2) if Xx Q Xq and XqYq C XxYx, rule Xq — > Yq is strictly redundant with respect to 
Xx^Yx. 

Simple redundancy in pQ| is explained as a potential connection between rules that come from 
the same frequent set, in our case XqYq = XiY\. The formal definition is not identical to our 
rendering: in its original statement in p], rule XZ — )■ Y is simply redundant with respect 
to X — > YZ, provided that Z ^ 0. The reason is that, in that reference, rules are always 
assumed to have disjoint sides, and then both formalizations are clearly equivalent. We do 
not impose disjointness, so that the natural formalization of their intuitive explanation is 



as we have just stated in Definition 3.4 The following is very easy to see (and is formally 
proved in pQ). 

Fact 3.5. [1 Both simple and strict redundancies imply standard redundancy. 

Note that, in principle, there could possibly be many other ways of being redundant 
beyond simple and strict redundancies: we show below, however, that, in essence, this is 
not the case. We can relate these notions also to the cover operator of [27]: 

Definition 3.6. [27 Rule Xx -> Y x covers rule Xq ->■ Yq when X t C Xq and XqYq C X x Y x . 

Here, again, the original definition, according to which rule X — > Y covers rule XZ — > 
Y' if Z C Y and 7'C7 (plus some disjointness and nonemptiness conditions that we omit) 
is appropriate for the case of disjoint sides. The formalization we give is stated also in |27j 
as a property that characterizes covering. Both simple and strict redundancies become thus 
merged into a single definition. We observe as well that the same notion is also employed, 
without an explicit name, in 



Again, it should be clear that, in Definition 3.6, the covered rule is indeed plainly 



redundant: whatever the dataset, changing from Xq — > Yq to Xx — > Yx the confidence stays 
equal or increases since, in the quotient S ^x) that defines the confidence of a rule X — > Y, 
the numerator cannot decrease from s(XqYq) to s(XxYx), whereas the denominator cannot 



increase from s(Xx) to s(Xq). Also, the proposals in Definition 3.4 and 3.6 are clearly 
equivalent: 

Fact 3.7. Rule Xx — > Yx covers rule Xq — > Yq if and only if rule Xq — > Yq is either 
simply redundant or strictly redundant with respect to X\ — >• Y\ , or they are equivalent by 
reflexivity. 

It turns out that all these notions are, in fact, fully equivalent to plain redundancy; 
indeed, the following converse statement is a main new contribution of this section: 

Theorem 3.8. Assume rule Xq Yq is plainly redundant with respect to Xx — > Yx, where 
Yq % Xq. Then rule Xx — > Yx covers rule Xq — > Yq. 



Proof. By Lemma 3.2, XqYq C XxYx. To see the other inclusion, Xx Q Xq, assume to the 
contrary that Xx % Xq. Then we can consider a dataset in which one transaction consists 
of XxYx and, say, m transactions consist of Xq. Since Xx % Xq, these m transactions do 
not count towards the supports of Xx or XxYx, so that the confidence of Xx — > Yx is 1; 
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also, Xq is not adding to the support of XqYq since Yq % Xq. As XqYq C X{Y\, exactly 
one transaction includes XqYq, so that c(Xq — > Yq) = 1/m, which can be made as low as 
desired. This would contradict plain redundancy. Hence, plain redundancy implies the two 
inclusions in the definition of cover. O 

Combining the statements so far, we obtain the following characterization: 

Corollary 3.9. Consider any two rules Xq — > Yq and X\ — > Y\ where Yq 52 Xq. The 
following are equivalent: 

(1) X\ C Xq and XqYq C X\Y\ (that is, rule X\ — > Y\ covers rule Xq — > Yq); 

(2) rule Xq — > Yq is either simply redundant or strictly redundant with respect to rule 
X\ —±Y\, or they are equivalent by reflexivity; 

(3) rule Xq — > Yq is plainly redundant with respect to rule X\ — > Y\; 

(4) rule Xq — > Yq is standard redundant with respect to rule X\ — > Y\. 

Marginally, we note here an additional strength of the proofs given. One could consider 
attempts at weakening the notion of plain redundancy by allowing for a "margin" or "slack" , 
appropriately bounded, but whose value is independent of the dataset, upon comparing 
confidences. The slack could be additive or multiplicative: conditions such as cd(Xq —> 
Yq) > ct){Xi ->■ Yi) - 5 or ct>{Xq ->• Y ) > 5cv(Xi Yi), for all V and for 5 independent 
of T>, could be considered. However, such approaches do not define different redundancy 
notions: they result in formulations actually equivalent to plain redundancy. This is due 



to the fact that the proofs in Lemma 3.2 and Theorem 3.8 show that the gap between the 
confidences of rules that do not exhibit redundancy can be made as large as desired within 
(0, 1). Likewise, if we fix a confidence threshold 7 G (0, 1) beforehand and use it to define 
redundancy as cx>(Xq — > Yq) > 7 =^ cd{X\ -±Y\) > 7 for all V, again an equivalent notion 
is obtained, independently of the concrete value of 7; whereas, for 7=1, this is, instead, a 
characterization of Armstrong derivability. 



3.1. Deduction Schemes for Plain Redundancy. From the characterization just given, 
we extract now a sound and complete deductive calculus. It consists of three inference 
schemes: right-hand Reduction (rR), where the consequent is diminished; right-hand Aug- 
mentation (rA), where the consequent is enlarged; and left-hand Augmentation (IA), where 
the antecedent is enlarged. As customary in logic calculi, our rendering of each rule means 
that, if the facts above the line are already derived, we can immediately derive the fact 
below the line. 

( rR ) x^z 

We also allow always to state trivial rules: 

x^ 

Clearly, scheme (£A) could be stated equivalently with XY — > YZ below the line, by (rA): 

X^YZ 



(£A') 



XY^YZ 



In fact, (£A) is exactly the simple redundancy from Definition 3.4 and, in the cases 
where Y C X, it provides a way of dealing with one direction of equivalence by reflexivity; 
the other direction is a simple combination of the other two schemes. The Reduction 
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Scheme (rR) allows us to "lose" information from the right-hand side; it corresponds to 
strict redundancy. 

As further alternative options, it is easy to see that we could also join (rR) and (rA) 
into a single scheme: 

M ) x^z - 

but we consider that this option does not really simplify, rather obscures a bit, the proof of 
our Corollary |3.10 below. Also, we could allow as trivial rules X — > Y whenever Y C X, 



which includes the case of Y = 0; such rules also follow from the calculus given by combining 
(r0) with (rA) and (rR). 

The following can be derived now from Corollary |3.9[ 



Corollary 3.10. The calculus given is sound and complete for plain redundancy; that is, 
rule Xq — > Yq is plainly redundant with respect to rule X\ — >• Y\ if and only if Xq — > Yq can 
be derived from X\ — > Y\ using the inference schemes (rR), (rA), and (£A). 

Proof. Soundness, that is, all rules derived are plainly redundant, is simple to argue by 
checking that, in each of the inference schemes, the confidence of the rule below the line is 
greater than or equal to the confidence of the rule above the line: these facts are actually 
the known statements that each of equivalence by reflexivity, simple redundancy, and strict 
redundancy imply plain redundancy. Also, trivial rules with empty right-hand side always 
hold. To show completeness, assume that rule Xq — > Yq is plainly redundant with respect 
to rule X\-*Yi. If Yq C Xq, apply (r0) and use (rA) to copy Xq and, if necessary, (rR) to 



leave just Yq in the right-hand side. If Yq % Xq, by Corollary 3.9, we know that this implies 
that X\ C Xq and XqYq C X{Y\. Now, to infer Xq — > Yq from X\ — > Y\, we chain up 
applications of our schemes as follows: 

X\ Y\ r-(rA) -^1 — y XlYi r-( r ii) X\ — > XqYq \~(£A) Xq — > Yq 

where the second step makes use of the inclusion XqYq C X{Yi, and the last step makes 
use of the inclusion X\ C Xq. Here, the standard derivation symbol h denotes derivability 
by application of the scheme indicated as a subscript. D 

We note here that [38] proposes a simpler calculus that consists, essentially, of (£A) 
(called there "weak left augmentation") and (rR) (called there "decomposition"). The 
point is that these two schemes are sufficient to prove completeness of the "representative 
basis" as given in that reference, due to the fact that, in that version, the rules of the 
representative basis include the left-hand side as part of the right-hand side; but such a 
calculus is incomplete with respect to plain redundancy because it offers no rule to move 
items from left to right. 



3.2. Optimum-Size Basis for Plain Redundancy. A basis is a way of providing a 
shorter list of rules for a given dataset, with no loss of information, in the following sense: 

Definition 3.11. Given a set of rules 1Z, B C 1Z is a complete basis if every rule of 1Z is 
plainly redundant with respect to some rule of B. 

Bases are analogous to covers in functional dependencies, and we aim at constructing 
bases with properties that correspond to minimum size and canonical covers. The solutions 
for functional dependencies, however, are not valid for partial rules due to the failure of the 
Armstrong schemes. 
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In all practical applications, 7Z is the set of all the rules "mined from" a given dataset T> 
at a confidence threshold 7 € (0,1]. That is, the basis is a set of rules that hold with 
confidence at least 7 in D, and such that each rule holds with confidence at least 7 in P 
if and only if it is plainly redundant with respect to some rule of B; equivalently, the rules 
in 1Z can be inferred from B through the corresponding deductive calculus. All along this 
paper, such a confidence threshold is denoted 7, and always 7 > 0. We will employ two 
simple but useful definitions. 

Definition 3.12. Fix a dataset D. Given itemsets Y and X C Y, X is a ^-antecedent for 
Y if c{X -> Y) > 7, that is, s(Y) > js(X). 

Note that we allow X = Y, that is, the set itself as its own 7-antecedent; this is just to 
simplify the statement of the following rather immediate lemma: 

Lemma 3.13. If X is a 7-antecedent for Y and X C Z C Y , then X is a 7-antecedent 
for Z and Z is a 7-antecedent for Y . 

Proof. From X C Z C Y we have s{X) > s(Z) > s(Y), so that s(Z) > s(Y) > js(X) > 
'ys(Z). The lemma follows. □ 

We make up for proper antecedents as part of the next notion: 

Definition 3.14. Fix a dataset T>. Given itemsets Y and X C Y (proper subset), X is a 
valid 7- antecedent for Y if the following holds: 

(1) X is a 7-antecedent of Y, 

(2) no proper subset of X is a 7-antecedent of Y, and 

(3) no proper superset of Y has X as a 7-antecedent. 

The basis we will focus on now is constructed from each Y and each valid antecedent of Y; 
we consider that this is the most clear way to define and study it, and we explain below 
why it is essentially identical to two existing, independent proposals. 

Definition 3.15. Fix a dataset D and a confidence threshold 7. The representative rules 
for T> at confidence 7 are all the rules X — > Y — X for all itemsets Y and for all valid 
7- antecedents X of Y. 

In the following, we will say "let X — > Y — X be a representative rule" to mean "let Y 
be a set having valid 7-antecedents, and let X be one of them"; the parameter 7 > will 
always be clear from the context. Note that some sets Y may not have valid antecedents, 
and then they do not generate any representative rules. 

By the conditions on valid antecedents in representative rules, the following relatively 
simple but crucial property holds; beyond the use of our Corollary |3.9[ the argument follows 
closely that of related facts in [29]: 

Proposition 3.16. Let rule X — )• Y — X be among the representative rules for T> at 
confidence 7. Assume that it is plainly redundant with respect to rule X' — > Y' , also of 
confidence at least 7; then, they are equivalent by reflexivity and, in case X' n Y' = 0, they 
are the same rule. 

Proof. Let X — > Y—X be a representative rule, so that X CY and X is a valid 7-antecedent 



of Y. By Corollary 3.9 X' -> Y' must cover X -»■ Y-X: X' C X C X(Y-X) = Y C X'Y' . 



As c(X' -> Y') > 7, X' is a 7-antecedent of X'Y'. We first show that Y = X'Y'; assume 



Y C X'Y', and apply Lemma 3.13 to X' C X C Y C X'Y': A' is also a 7-antecedent of Y, 
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and the minimality of valid 7-antecedent X gives us X = X' . X is, thus, a 7-antecedent of 
X'Y' which properly includes Y, contradicting the third property of valid antecedents. 

Hence, Y = X'Y' , so that X' is a 7-antecedent of X'Y' = Y; but again X is a minimal 7- 
antecedent of X'Y' = Y, so that necessarily X = X' , which, together with X'Y' = Y = XY, 
proves equivalence by reflexivity. Under the additional condition X' fl Y' = 0, both rules 



coincide as per Proposition 2.2 O 



It easily follows that our definition is equivalent to the definition given in |27| . except 



for a support bound that we will explain later; indeed, we will show in Section 4.5 that all 
our results carry over when a support bound is additionally enforced. 

Corollary 3.17. Fix a dataset T> and a confidence threshold 7. Let X C Y. The following 
are equivalent: 

(1) Rule X — > Y — X is among the representative rules for T> at confidence 7; 

(2) [27j c(X -)■ Y-X) > 7 and there does not exist any other rule X' Y' with X'nY' = 0, 
of confidence at least 7 in T>, that covers X — > Y — X. 

Proof. Let rule X — > Y — X be among the representative rules for T> at confidence 7, and let 
rule X' — > Y' cover it, while being also of confidence at least 7 and with X'nY' = 0. Then, 



by Corollary 3.9 X' — > Y' makes X — > Y — X plainly redundant, and by Proposition 3.16 
they must coincide. To show the converse, we must see that X — > Y — X is a representative 
rule under the conditions given. The fact that c(X — )■ Y — X) > 7 gives that X is a 7- 
antecedent of Y, and we must see its validity. Assume that a proper subset X' C X is also 
a 7-antecedent of Y: then the rule X' — > Y — X' would be a different rule of confidence at 
least 7 covering X — > Y — X, which cannot be. Similarly, assume that X is a 7-antecedent 
of Y' where Y C Y' : then the rule X — > Y' — X would be a different rule of confidence at 
least 7 covering X — > Y — X, which cannot be either. O 

Similarly, and with the same proviso regarding support, our definition is equivalent to 
the "essential rules" of [I]. There, the set of minimal 7-antecedents of a given itemset is 
termed its "boundary" . The following statement is also easy to prove: 

Corollary 3.18. Fix a dataset T> and a confidence threshold 7. Let icy, The following 
are equivalent: 

(1) Rule X — > Y — X is among the representative rules for T> at confidence 7; 

(2) P X is in the boundary of Y but is not in the boundary of any proper superset of Y ; 
that is, X is a minimal ^-antecedent of Y but is not a minimal ^-antecedent of any 
itemset strictly containing Y . 

Proof. If X —7- Y — X is among the representative rules, X must be a minimal 7-antecedent 
of Y by the conditions of valid antecedents; also, X is not a 7-antecedent at all (and, thus, 
not a minimal 7-antecedent) of any Y' properly including Y. Conversely, assume that X 
is in the boundary of Y but is not in the boundary of any proper superset of Y; first, X 
must be a minimal 7-antecedent of Y so that the first two conditions of valid 7-antecedents 
hold. Assume that X — > Y — X is not among the representative rules; the third property 
must fail, and X must be a 7-antecedent of some Y' with Y C Y'. Our hypotheses tell us 
that X is not a minimal 7-antecedent of Y'. That is, there is a proper subset X' C X that 



is also a 7-antecedent of Y' . It suffices to apply Lemma 3.13 to X' C X C Y C Y' to reach 
a contradiction, since it implies that X' is a 7-antecedent of Y and therefore X would not 
be a minimal 7-antecedent of Y. □ 
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The representative rules are indeed a basis: 

Fact 3.19. Ql|, |27J) Fix a dataset T> and a confidence threshold 7, and consider the set 
of representative rules constructed from D; it is a complete basis: 

(1) all the representative rules hold with confidence at least 7; 

(2) all the rules of confidence at least 7 in D are plainly redundant with respect to the 
representative rules. 

The first part follows directly from the use of 7-antecedents as left-hand sides of represen- 
tative rules. For the second part, also almost immediate, suppose c(X — > Y) > 7, and let 
Z = XY; since X is now a 7-antecedent of Z, it must contain a minimal 7-antecedent of 
Z, say X' C X. Let Z' be the largest superset of Z such that X' is still a 7-antecedent 
of Z' . Thus, X' — > Z' — X' is among the representative rules and covers X — > Y. Small 
examples of the construction of representative rules can be found in the same references; 
we also provide one below. 

An analogous fact is proved in [38] through an incomplete deductive calculus consisting 
of the schemes that we have called (I A) and (rR), and states that every rule of confidence 
at least 7 can be inferred from the representative rules by application of these two inference 
schemes. Since representative rules in the formulation of [38] have a right-hand side that 
includes the left-hand side, this inference process does not need to employ {rA). 

Now we can state and prove the most interesting novel property of this basis, which 



again follows from our main result in this section, Corollary 3.9. As indicated, representa- 
tive rules were known to be irredundant with respect to simple and strict redundancy or, 
equivalently, with respect to covering. But, for standard redundancy, in principle there was 
actually the possibility that some other basis, constructed in an altogether different form, 
could have less rules. We can state and prove now that this is not so: there is absolutely no 
other way of constructing a basis smaller than this one, while preserving completeness with 
respect to plain redundancy, because it has absolutely minimum size among all complete 
bases. Therefore, in order to find smaller bases, a notion of redundancy more powerful than 
plain (or standard) redundancy is unavoidably necessary. 

Theorem 3.20. Fix a dataset T>, and let 7Z be the set of rules that hold with confidence 7 in 
T>. Let B' Q1Z be an arbitrary basis, complete so that all the rules in 1Z are plainly redundant 
with respect to B' . Then, B' must have at least as many rules as the representative rules. 
Moreover, if the rules in B' are such that antecedents and consequents are disjoint, then all 
the representative rules belong to B' . 

Proof. By the assumed completeness of B', each representative rule X — > Y — X must be 



redundant with respect to some rule X' — > Y' G B' C TZ. By Corollary 3.9, X' — > Y' covers 



X — > Y — X. Then Proposition 3. 16| applies: they are equivalent by reflexivity. This means 



X = X' and Y = X'Y', hence X' — > Y' uniquely identifies which representative rule it 
covers, if any; hence, B' needs, at least, as many rules as the number of representative rules. 



Moreover, as stated also in Proposition 3.16, if the disjointness condition X' C)Y' = $ holds, 



then both rules coincide. □ 

Example 3.21. We consider a small example consisting of 12 transactions, where there are 
actually only 7 itemsets, but some of them are repeated across several transactions. We can 
simplify our study as follows: if X is not a closed set for the dataset, that is, if it has some 
superset X' D X with the same support, then clearly it has no valid 7-antecedents (see also 
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Figure 1: Closed itemsets for a small example 



Fact 4.3 below); thus we concentrate on closed sets. Figure [T] shows the example dataset and 
the corresponding (semi-) lattice of closures, depicted as a Hasse diagram (that is, transitive 
edges have been removed to clarify the drawing); edges stand for the inclusion relationship. 

For this example, the implications can be summarized by six rules, namely, AC =>• B, 
BC A, AD B, BD =^ A, CF => D, and DF => C, which are also the representative 
rules at confidence 1. At confidence 7 = 0.75, we find that, first, the left-hand sides of 
the six implications are still valid 7-antecedents even at this lower confidence, so that the 
implications still belong to the representative basis. Then, we see that two of the closures, 
ABC and CD, have additionally one valid 7-antecedent each, whereas AB has two. The 
following four rules hold: A — > B, B — > A, AB — > C, and D — )• C. These four rules, 
jointly with the six implications indicated, constitute exactly the ten representative rules 
at confidence 0.75. 



4. Closure-Based Redundancy 



Theorem 3.20 in the previous section tells us that, for plain redundancy, the absolute 
limit of a basis at any given confidence threshold is reached by the set of representative 
rules. Several studies, prominently [H], have put forward a different notion of redundancy; 
namely, they give a separate role to the full-confidence implications, often through their 
associated closure operator. Along this way, one gets a stronger notion of redundancy and, 
therefore, a possibility that smaller bases can be constructed. 

Indeed, implications can be summarized better, because they allow for Transitivity and 
Augmentation to apply in order to find redundancies; moreover, they can be combined in 
certain forms of transitivity with partial rules: as a simple example, if c(X — > Y) > 7 
and c(Y — > Z) = 1, that is, if a fraction 7 or more of the support of X has Y and all 
the transactions containing Y do have Z as well, clearly this implies that c(X — > Z) > 7. 
Observe, however, that the directionality is relevant: from c{X — >■ Y) = 1 and c(Y — > Z) > 7 
we infer nothing about c(X — > Z), since the high confidence of Y — > Z might be due to a 
large number of transactions that do not include X. 
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We will need some notation about closures. Given a dataset T>, the closure operator 
associated to T> maps each itemset A to the largest itemset A that contains X and has the 
same support as X in V: s(X) = s(X), and X is as large as possible under this condition. 
It is known and easy to prove that X exists and is unique. Implications that hold in the 
dataset correspond to the closure operator ([19j. [23] . [36] . [33] . [Sj): c(X — > X) = 1, and 
X is as large as possible under this condition. Equivalently, the closure of itemset X is the 
intersection of all the transactions that contain X; this is because X C X implies that all 
transactions counted for the support of X are counted as well for the support of X, hence, 
if the support counts coincide they must count exactly the same transactions. 

Along this section, as in (36], we denote full-confidence implications using the standard 
logic notation Xq =^ Yq; thus, Xq => Yo if and only if Yq C Xq. 

A basic fact from the theory of Closure Spaces is that closure operators are characterized 
by three properties: extensivity (X C X), idempotency (X = X), and monotonicity (if 
X C Y then ICY). As an example of the use of these properties, we note the following 
simple consequence for later use: 



Lemma 4.1. XY C XY CX7C XY, and XY = XY = X Y = XY = XY. 

We omit the immediate proof. A set is closed if it coincides with its closure. Usually 
we speak of the lattice of closed sets (technically it is just a semilattice but it allows for 
a standard transformation into a lattice [14J). When X = Y we also say that X is a 
generator of Y; if the closures of all proper subsets of X are different from Y, we say that 
A is a minimal generator. Note that some references use the term "generator" to mean our 
"minimal generator" ; we prefer to make explicit the minimality condition in the name. In 
some works, often database-inspired, minimal generators are termed sometimes "keys". In 
other works, often matroid-inspired, they are termed also "free sets". Our definition says 
explicitly that s(X) = s(X). We will make liberal use of this fact, which is easy to check 
also with other existing alternative definitions of the closure operator, as stated in |36| . |44j . 
and others. Several quite good algorithms exist to find the closed sets and their supports 
(see section 4 of [T2]). 

Redundancy based on closures is a natural generalization of equivalence by reflexivity; 
it works as follows ( |44| . see also |29] and section 4 in |36j): 

Lemma 4.2. Given a dataset and the corresponding closure operator, two partial rules 
Xq —> Yq and X\ —> Yi such that Xq = X\ and XqYq = X{Yi have the same support and 
the same confidence. 



The rather immediate reason is that s(Xq) = s(Xq) = s(Xi) = s(Ai), and s(XqYq 



Oj 



s(XqYq) = s(XiYi) = s(X\Yi). Therefore, groups of rules sharing the same closure of the 
antecedent, and the same closure of the union of antecedent and consequent, give cases of 
redundancy. On account of these properties, there are some proposals of basis constructions 
from closed sets in the literature, reviewed below. But the first fact that we must mention 
to relate the closure operator with our explanations so far is the following: 

Fact 4.3. [28] Let X — > Y — X be a representative rule as per Definition 3.15 Then Y is 
a closed set and A is a minimal generator. 



The proof is direct from Definitions 3.14 and 3.15, and can be found in [28], [29J, [38J. 



These references employ this property to improve on the earlier algorithms to compute the 
representative rules, which considered all the frequent sets, by restricting the exploration to 
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closures and minimal generators. Also the authors of [40J do the same, seemingly unaware 



that the algorithm in [28j already works just with closed itemsets. Fact 4.3 may shed doubts 
on whether closure-based redundancy actually can lead to smaller bases. We prove that this 
is sometimes the case, due to the fact that the redundancy notion itself changes, and allows 
for a form of Transitivity, which we show can take again the form of a deductive calculus. 
Then, we will be able to refine the notion of valid antecedent of the previous section and 
provide a basis for which we can prove that it has the smallest possible size among the bases 
for partial rules, with respect to closure-based completeness. That is, we will reach the limit 
of closure-based redundancy in the same manner as we did for standard redundancy in the 
previous section. 

4.1. Characterizing Closure-Based Redundancy. Let B be the set of implications in 
the dataset T>; alternatively, B can be any of the bases already known for implications in 
a dataset. In our empirical validations below we have used as B the Guigues-Duquenne 
basis, or GD-basis, that has been proved to be of minimum size [23], |43| . An apparently 
popular and interesting alternative, that has been rediscovered over and over in different 
guises, is the so-called iteration-free basis of [53], which coincides with the proposal in [37] 
and with the exact min-max basis of [36J (also called sometimes generic basis [29J); because 



of Fact 4.3, it coincides exactly also with the representative rules of confidence 1, that 



is: implications that are not plainly redundant with any other implication according to 



Definition 3.1 Also, it coincides with the "closed-key basis" for frequent sets in [39], which 
in principle is not intended as a basis for rules, and has a different syntactic sugar, but 
differs in essence from the iteration-free basis only in the fact that the support of each rule 
is explicitly recorded together with it. 

Closure-based redundancy takes into account B as follows: 

Definition 4.4. Let B be a set of implications. Partial rule Xq — > Yq has closure-based 
redundancy relative to B with respect to rule X\ — > Y\, denoted B, {X\ — > Y{\ \= Xq — > Yq, 
if any dataset T> in which all the rules in B hold with confidence 1 gives cd(Xq — > Yq) > 
c v {X 1 -> Y x ). 

In some cases, it might happen that the dataset at hand does not satisfy any nontrivial 
rule with confidence 1; then, this notion will not be able to go beyond plain redundancy. 
However, it is usual that some full-confidence rules do hold, and, in these cases, as we shall 
see, closure-based redundancy may give more economical bases. More generally, all our 
results only depend on the implications reaching indeed full confidence in the dataset; but 
they are not required to capture all of these: the implications in B (with their consequences 
according to the Armstrong schemes) could constitute just a part of the full-confidence rules 
in the dataset. In particular, plain redundancy reappears by choosing B = 0, whether the 
dataset satisfies or not any full-confidence implication. 

We continue our study by showing a necessary and sufficient condition for closure-based 
redundancy, along the same lines as the one in the previous section. 

Theorem 4.5. Let B be a set of exact rules, with associated closure operator mapping each 
itemset Z to its closure Z . Let Xq — > Yq be a rule not implied by B, that is, where Yq % Xq. 
Then, the following are equivalent: 

(1) X x C Xq and XqYq C Xtfi; 

(2) B,{X 1 ^Yt} ^Xq^Yq. 
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Proof. The direct proof is simple: the inclusions given imply that s(X\) > s(Xq) = s(Xq) 
and s(X Y ) > s(XWi) = 8(X 1 Y 1 ); then c(X Q -> Yq) = ^pgy 1 > = c(*i -> 

Conversely, for Yq $Z Xq, we argue that, if either of X\ C Ao and Aolo ^ A1Y1 fails, 
then there is a dataset where /3 holds with confidence 1 and X\ —> Y\ holds with high 
confidence but the confidence of Xq — > Yq is low. 

We observe first that, in order to satisfy B, it suffices to make sure that all the trans- 
actions in the dataset we are to construct are closed sets according to the closure operator 
corresponding to B. 

Assume now that X\ <£. Xq: then a dataset consisting only of one or more trans- 
actions with itemset Xq satisfies (vacuously) X\ — > Y\ with confidence 1 but, given that 
Yq % Xq, leads to confidence zero for Xq —¥Yq. It is also possible to argue without resorting 
to vacuous satisfaction: simply take one transaction consisting of X\Y\ and, in case this 
transaction satisfies Xq — > Yq, obtain as low a confidence as desired for Xq — > Yq by adding 
as many transactions Xq as necessary; these will not change the confidence of X\ -4 Y\ 
since X\ % Xq. 

Then consider the case where X\ C Xq, whence the other inclusion fails: XqYq % X\Y\. 
Consider a dataset of, say, n transactions, where one transaction consists of the itemset Xq 
and n — 1 transactions consist of the itemset X\Y\. The confidence of X\ — > Y% is at least 
^— -, which can be made as close to 1 as desired by increasing n, whereas the presence of at 
least one Xq and no transaction at all containing XqYq gives confidence zero to Xq — > Yq. 
Thus, in either case, we see that redundancy does not hold. □ 



4.2. Deduction Schemes for Closure-Based Redundancy. We provide now a stronger 
calculus that is sound and complete for this more general case of closure-based redundancy. 
For clarity, we chose to avoid the closure operator in our deduction schemes, writing instead 
explicitly each implication. 

Our calculus for closure-based redundancy consists of four inference schemes, each of 
which reaches a partial rule from premises including a partial rule. Two of the schemes 
correspond to variants of Augmentation, one for enlarging the antecedent, the other for 
enlarging the consequent. The other two correspond to composition with an implication, 
one in the antecedent and one in the consequent: a form of controlled transitivity. Their 
names (rA), (IA), (rl), and (£1) indicate whether they operate at the right or left-hand 
side and whether their effect is Augmentation or composition with an Implication. 



{rA) 

(rl) 
(£A) 



X^YZ 
X^Y, Y- 

X~^Z 
X^rYZ 



(£1) 



XY^Z 

X^Y, ZCX, Z^X 
Z^Y 

Again we allow to state rules with empty right-hand side directly: 

(f0) m 

Alternatively, we could state trivial rules with a subset of the left-hand side at the right- 
hand side. Note that this opens the door to using (rA) with an empty Y, and this allows 
us to "downgrade" an implication into the corresponding partial rule. Again, (£A) could 



be stated equivalently as (£A') like in Section 3.1 In fact, the whole connection with the 
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simpler calculus in Section 3.1 should be easy to understand: first, observe that the (£A) 
rules are identical. Now, if implications are not considered separately, the closure operator 
trivializes to identity, Z = Z for every Z, and the only cases where we know that X\ => Y\ 
are those where Y\ C X\\ we see that (rl) corresponds, in that case, to (rR), whereas the 
(rA) schemes only differ on cases of equivalence by reflexivity. Finally, in that case (£1) 
becomes fully trivial since Z => X becomes X C Z and, together with Z C X, leads to 
X = Z: then, the partial rules above and below the line would coincide. 

Similarly to the plain case, there exists an alternative deduction system, more compact, 
whose equivalence with our four schemes is rather easy to see. It consists of just two forms 
of combining a partial rule with an implication: 



(rl') 



x^z 

X-5-Y, ZCXY, Z^X 



m z^y 

However, in our opinion, the use of these schemes in our further developments is less 
intuitive, so we keep working with the four schemes above. 

In the remainder of this section, we denote as B, {X — > Y} h X' — > Y' the fact that, in 
the presence of the implications in the set B, rule X' — > Y' can be derived from rule X — > Y 
using zero or more applications of the four deduction schemes; along such a derivation, 
any rule of B (or derived from B by the Armstrong schemes) can be used whenever an 
implication of the form X =4> Y is required. 

4.3. Soundness and Completeness. We can characterize the deductive power of this 
calculus as follows: it is sound and complete with respect to the notion of closure-based 
redundancy; that is, all the rules it can prove are redundant, and all the redundant rules 
can be proved: 

Theorem 4.6. Let B consist of implications. Then, B, {X\ — > Y±} h X$ —> Yq if and only 
if rule Xq — > Yq has closure-based redundancy relative to B with respect to rule X\ — > Y\: 
B,{X X ^Y X }\=X Q ^Y Q . 

Proof. Soundness corresponds to the fact that every rule derived is redundant: it suffices to 
prove it individually for each scheme; the essentials of some of these arguments are also found 
in the literature. For (rA), the inclusions XY C XYZ C XY prove that the partial rules 
above and below the line have the same confidence. For (rl), one has XZ C XY C XY, 
thus s(XZ) > s(XY) and the confidence of the rule below the line is at least that of the one 
above, or possibly greater. Scheme (£A) is unchanged from the previous section. Finally, 
for (II), we have Z C X C Z so that s(Z) = s(X), and ZY C XY so that s(ZY) > s(XY), 
and again the confidence of the rule below the line is at least the same as the confidence of 
the one above. 

To prove completeness, we must see that all redundant rules can be derived. We assume 



B, {X\ —tYx} |= -^o ~~ ► Yq and resort to The orem 4.5 we know that the inclusions X\ C X { 



and XqYq C X\Y\ must hold. From Lemma 4.1, we have that XqYq C XiY±. 



Now we can write a derivation in our calculus, taking into account these inclusions, as 
follows: 

X\ — > Y\ \-(rA) X\ —?■ X\Y\ \~(rl) Xi XqYq \~(£A) Xq — > Yq h^/) X Y 

Thus, indeed the redundant rule is derivable, which proves completeness. □ 
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4.4. Optimum-Size Basis for Closure-Based Redundancy. In a similar way as we 
did for plain redundancy, we study here bases corresponding to closure-based redundancy. 

Since the implications become "factored out" thanks to the stronger notion of redun- 
dancy, we can focus on the partial rules. A formal definition of completeness for a basis is, 
therefore, as follows: 

Definition 4.7. Given a set of partial rules TZ and a set of implications £>, closure-based 
completeness of a set of partial rules B' C TZ holds if every partial rule of TZ has closure-based 
redundancy relative to B with respect to some rule of B' . 

Again TZ is intended to be the set of all the partial rules "mined from" a given dataset T> 
at a confidence threshold 7 < 1 (recall that always 7 > 0), whereas B is intended to be 
the subset of rules in TZ that hold with confidence 1 in D or, rather, a basis for these 
implications. There exist several proposals for constructing bases while taking into account 
the implications and their closure operator. We use the same intuitions and modus operandi 
to add a new proposal which, conceptually, departs only slightly from existing ones. Its 
main merit is not the conceptual novelty of the basis itself but the mathematical proof that 
it achieves the minimum possible size for a basis with respect to closure-based redundancy, 
and is therefore at most as large as any alternative basis and, in many cases, smaller than 
existing ones. 

Our new basis is constructed as follows. For each closed set Y, we will consider a 
number of closed sets A properly included in Y as candidates to act as antecedents: 

Definition 4.8. Fix a dataset T>, and consider the closure operator corresponding to the 
implications that hold in D with confidence 1. For each closed set Y, a closed proper subset 
A C Y is a basic ^-antecedent if the following holds: 

(1) A is a 7-antecedent of Y: s(Y) > 75(A); 

(2) no proper closed subset of A is a 7-antecedent of Y, and 

(3) no proper closed superset of Y has A as a 7-antecedent. 

Basic antecedents follow essentially the same pattern as the valid antecedents (Defini- 
tion 3.14), but restricted to closed sets only, that is, instead of minimal antecedents, we 
pick just minimal closed antecedents. Then we can use them as before: 

Definition 4.9. Fix a dataset T> and a confidence threshold 7. 

(1) The basis B* consists of all the rules A — > Y — X for all closed sets Y and all basic 
7-antecedents A of Y. 

(2) A minmax variant of the basis B* is obtained by replacing each left-hand side in B* 
by a minimal generator: that is, for a closed set Y, each rule A — > Y — X becomes 
X' — > Y — X for one minimal generator A' of the (closed) basic 7-antecedent A. 

(3) A minmin variant of the basis B* is obtained by replacing by a minimal generator both 
the left-hand and the right-hand sides in B*: for each closed set Y and each basic 7- 
antecedent A of Y, the rule A — > Y — X becomes A' — > Y 1 — X where Y' is chosen a 
minimal generator of Y and A' is chosen a minimal generator of A. 

The variants are defined only for the purpose of discussing the relationship to previous 
works along the next few paragraphs; generally, we will use only the first version of B*. 
Note the following: in a minmax variant, at the time of substituting a generator for the 
left-hand side closure, in case we consider a rule from B* that has a left-hand side with 
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several minimal generators, only one of them is to be used. Also, all of A (and not only 
A') can be removed from the right-hand side: (rA) can be used to recover it. 

The basis B* is uniquely determined by the dataset and the confidence threshold, but the 
variants can be constructed, in general, in several ways, because each closed set in the rule 
may have several minimal generators, and even several different generators of minimum size. 
We can see the variants as applications of our deduction schemes. The result of substituting 
a generator for the left-hand side of a rule is equivalent to the rule itself: in one direction it 
is exactly scheme (£1), and in the other is a chained application of (rA) to add the closure 
to the right-hand side and (£A) to put it back in the left-hand side. Substituting a generator 
for the right-hand side corresponds to scheme (rl) in both directions. 

The use of generators instead of closed sets in the rules is discussed in several references, 
such as [36] or [33]. In the style of [36], we would consider a minmax variant, which 
allows one to show to the user minimal sets of antecedents together with all their nontrivial 
consequents. In the style of |44j . we would consider a minmin variant, thus reducing the total 
number of symbols if minimum-size generators are used, since we can pick any generator. 
Each of these known bases incurs a risk of picking more than one minimum generator for 
the same closure as left-hand sides of rules with the same closure of the right-hand side: 
this is where they may be (and, in actual cases, have been empirically found to be) larger 
than Bt, because, in a sense, they would keep in the basis all the variants. Facts analogous 



to Corollaries 3.17 and 3.18 hold as well if the closure condition is added throughout, 



and provide further alternative definitions of the same basis. We use one of them in our 



experimental setting, described in Section 4.6. We now see that this set of rules entails 



exactly the rules that reach the corresponding confidence threshold in the dataset: 

Theorem 4.10. Fix a dataset V and a confidence threshold 7. Let B be any basis for 
implications that hold with confidence 1 in T>. 

(1) All the rules in B* hold with confidence at least 7. 

(2) B* is a complete basis for the partial rules under closure-based redundancy. 

Proof. All the rules in B* must hold indeed because all the left-hand sides are actually 7- 
antecedents. To prove that all the partial rules that hold are entailed by rules in £>*, assume 
that indeed X — > Y holds with confidence 7, that is, s(XY) = s(XY) > 75(A); thus X is a 
7-antecedent of XY. If Y C X, then c(X — )■ Y) = 1 and the implication will follow from B\ 
we have to discuss only the case where Y <^ X, which implies that X C XY. Consider the 
family of closed sets that include XY and have X as 7-antecedent; it is a nonempty family, 
since XY fulfills these conditions. Pick Z maximal in that family. Then X <Z Z = Z since 
X C Z and X C XY C Z. Now, A is a 7-antecedent of Z, but not of any strictly larger 
closed itemset. Also, any subset of A is a proper subset of Z. 

Let A' C A be closed, a 7-antecedent of Z, and minimal with respect to these prop- 
erties; assume that A' is a 7-a nteced ent of a closed set Z' strictly larger than Z. From 



X' C A C Z C Z' and Lemma 3.13 A would be also a 7-antecedent of Z' , which would 
contradict the maximality of Z. Therefore, A' cannot be a 7-antecedent of a closed set 
strictly larger than Z and, together with the facts that define A', we have that X' is a basic 
7-antecedent of Z whence A' — > Z — X' G B*. 



We gather the following inequalities: A' C A and XY C Z = Z = X'(Z - A'); this is 



exactly what we need to infer that B, {A' — > Z — A'} |= A — > Y from Theorem 4.5. D 
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Now we can move to the main result of this section: this basis has a minimum number 
of rules among all bases that are complete for the partial rules, according to closure-based 
redundancy with respect to B. 

Theorem 4.11. Fix a dataset D, and let TZ be the set of rules that hold with confidence 7 
in T>. Let B be a basis for the set of implications in TZ. Let B' C TZ be an arbitrary basis, 
having closure-based completeness for TZ with respect to B. Then, B' must have at least as 
many rules as B*. 

Proof. First, we will prove the following intermediate claim: for each partial rule in £>*, 
say X — > Y — X, there is in B' a corresponding partial rule of the form X' — > Y' with 
JUT' = Y and X 7 = X. We pick any rule X -> Y - X G B*, that is, where X is 
a basic 7-antecedent of Y; this rule must be redundant, relative to the implications in 
B, with respect to the new basis B' under consideration: for some rule X' — > Y' G B' , 



we have that B,{X' — > Y'} \= X — > Y — X which, by Theorem 4.5 is the same as 



X' C X = X and Y C X'Y', together with c(X' — > Y') > 7. We consider some support 

ratios: s ^x) ^ = ^jhfj ^ — " — ^> wn i cn means that X is a 7-antecedent of X'Y', a 

closed set including Y; by the second condition in the definition of basic 7-antecedent, this 
cannot be the case unless X'Y' = Y. 

Then, again, c{X'_-> Y) = c(X' X'Y') =_c(X' Y') > 7, that is, X' is a 7- 
antecedent of Y, and X' C Y = Y is as well; but X' C X = X and, by minimality of X as 
a basic 7-antecedent of Y, it must be that X' = X. 

Now, to complete the proof of the theorem, we observe that each such rule X' — > Y' 
in B' determines uni vocally both closed sets X and Y, so that the same rule in B' cannot 
correspond to more than one of the rules in B*. This requires B' , therefore, to have at least 
as many rules as £>*. □ 

In applications of B*, one needs, in general, as a basis both B* and a basis for the 
implications, such as the GD-basis. On the other hand, in many practical cases, implications 
provide little new knowledge, most often just showing existing (and known) properties of 
the attributes. If a user is satisfied with the B* basis, and does not ask for a basis for the 
implications nor the representative rules, then (s)he may get results faster, since in this case 
the algorithms would not need to compute minimal generators, and just mining closures 
and their supports (and organizing them via the subset relation) would suffice. 

Note that the joint consideration of the GD-basis and B* incurs the risk of being a 
larger set of rules than the representative rules, due to the fact that some rules in the 
GD-basis could be, in fact, plainly redundant (ignoring the closure-related issues) with 
a representative rule. We have observed empirically that, at high confidence thresholds, 
the representative rules tend to be a large basis due to the lack of specific minimization of 
implications, whereas the union of the GD-basis and B* tends to be quite smaller; conversely, 
at lower confidence levels, the availability of many partial rules increases the chances of 
covering a large part of the GD-basis, so that the representative rules are a smaller basis 
than the union of B* plus GD, even if they are more in number than B*. That is: closure- 
based redundancy may be either stronger or weaker, in terms of the optimum basis sizes, 
than plain redundancy. Sometimes, B* even fully coincides with the partial representative 
rules. This is, in fact, illustrated in the following example. 



Example 4.12. We revisit the example in Figure[T} As indicated at the end of Section 3.2 



the basis for implications consists of six rules: AC => B, AD B, BC A, BD A, 
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CF D, and DF => C; the iteration- free basis [33] and the Guigues-Duquenne basis 
|23j coincide here, and these implications are also the representative rules at confidence 1. 
At confidence 7 = 0.75, these are kept and four representative rules are added: A — > B, 
B — >• A, AB — > C, and D — > C. Since the four left-hand sides are, actually, closed sets, 
which is not guaranteed in general, the basis B* at this confidence includes exactly these 
four rules: no other closure is a basic 7-antecedent. 

However, if the confidence threshold is lowered to 7 = 0.6, we find seven rules in the Bq 6 
basis: A -> BC, B -> AC, C -)■ D, D ^ C, CD -> F, and F -> CD, plus the somewhat 
peculiar — >■ C, since indeed the support of C is above the same threshold; the rules 
A — > B, B — > A, and AB — > C also hold, but they are redundant with respect to A — > BC 
or B — > AC: A and B are 7-antecedents of AB but are not basic (by way of being also 
7-antecedents of ABC), whereas AB is a 7-antecedent of ABC but is not basic either since 
it is not minimal. 

Additionally, the sizes of the rules can be reduced somewhat: A — > C suffices to give 
A — > BC or indeed A — > ABC since A — > C is equivalent by reflexivity to A — > AC and 
there is a full-confidence implication AC =4> B in the GD-basis that gives us A — > ABC. 
This form of reasoning is due to [H] , and a similar argument can be made for several of the 
other rules. Alternatively, there exists the option of omitting those implications that, seen 
as partial rules, are already covered by a partial rule: in this example, these are AC => B 
and BC A, covered by A — > BC (but not by A — > C, which needs AC B to infer 
A — > BC); similarly, CF => D and CD => F are plainly redundant with C — > DF. In 
fact, it can be readily checked that the seven partial rules in Bq 6 plus the two remaining 
implications in the GD-basis, AD => B and BD =>■ A, form exactly the representative rules 
at this confidence threshold. 

4.5. Double-Support Mining. For many real-life datasets, including all the standard 
benchmarks in the field, the closure space is huge, and reaches easily hundreds of thousands 
of nodes, or indeed even millions. A standard practice, as explained in the introduction, is to 
impose a support constraint, that is, to ignore (closed) sets that do not appear often enough. 
It has been observed also that the rules removed by this constraint are often appropriately 
so, in that they are less robust and prone to represent statistical artifacts rather than true 
information |34| . Hence, we discuss briefly what happens to our basis proposal if we work 
under such a support constraint. 

For a dataset V and confidence and support thresholds 7 and r, respectively, denote 
by 7£ 7)T the set of rules that hold in V with confidence at least 7 and support at least r. 
We may want to construct either of two similar but different sets of rules: we can ask just 
how to compute the set of rules in B* that reach that support or, more likely, we may wish 
a minimum-size basis for 7£ 7jT . We solve both problems. 

We first discuss a minimum-size basis for TZ~ iT . Of course, the natural approach is 
to compute the rule basis exactly as before, but only using closed sets above the support 
threshold. Indeed this works: 

Proposition 4.13. Fix a dataset T>. For any fixed confidence threshold 7 and support 
threshold t, the construction of basic ^-antecedents, applied only to closed sets of support 
at least t, provides a minimum-size basis for 7£ 7jT . 

Proof. Consider any rule X — > Y of support at least r and confidence at least 7. Then X 
is a 7-antecedent of XT; also, s(X) = s(X) > s(XY) = s(XY) > r. 
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Arguing as in the proof of Theorem 4.10 but restricted to the closures with support 



at least r, we can find a rule X' — > Y' — X' where both X' and X'Y' have support at 
least r, X' is a basic 7-antecedent of X'Y', and such that X' C X and XY C X'Y' so 



that it covers X — > Y. Minimum size is argued exactly as in the proof of Theorem 4.11 
following the same steps, one proves that any complete basis consisting of rules in 1Z^ IiT must 
have separate rules to cover each of the rules formed by basic 7-antecedents of closures of 
support t. □ 

We are therefore safe if we apply the basis construction for B* to a lattice of frequent 
closed sets above support r, instead of the whole lattice of closed sets. However, this fact 
does not ensure that the basis obtained coincides with the set of rules in the whole basis 
£>* having support above r. There may be rules that are not in B* because a large closure, 
of low support, prevents some X from being a basic antecedent. If the large closure is 
pruned by the support constraint, then X may become a basic antecedent. The following 
result explains with more precision the relationship between the basis B* and the rules of 
support t. 

Proposition 4.14. Fix a dataset T>, a confidence threshold 7, and a support threshold r. 
Assume that ICY and that s(Y) > r; then X — > Y — X £ B* if and only if X is a basic 
^-antecedent ofY in the set of all closures of support at least 7 x r. 

This proposition says that, in order to find £>*n7£ 7iT , that is, the set of rules in B* that 
have support at least r, we do not need to compute all the closures and construct the whole 
of £>*; it suffices to perform the B* construction on the set of closures of support 7 x r. Of 
course, in both cases we must then discard the rules of support less than r. We call this 
sort of process double-support mining: given user-defined 7 and r, use the product to find 
all closures of support 7 x r, compute £>* on these closures, and finally prune out the rules 
with support less than r to obtain B* n TZj, T , if that is what is desired. 

Proof. Consider a pair of closed sets X C Y with s(X) > s(Y) > r; we must discuss 
whether X is a basic 7-antecedent of Y in two different closure lattices: the one of all the 
closed sets and the one of frequent closures at support threshold 7 x r. 

The properties of being a 7-antecedent and of being minimally so refer to X and Y 
themselves or to even smaller sets, and are therefore unaffected by the support constraint. 
We must discuss just the existence of some proper superset of Y having X as a 7-antecedent. 
In case X is a basic 7-antecedent of Y, no proper superset Z of Y has X as 7-antecedent, 
whatever the support of Z; therefore, X will be found to be a basic 7-antecedent of Y also 
in the smaller lattice of frequent closures. 

To show the converse, it suffices to argue that, for any proper superset Z of Y, if X is 
a 7-antecedent of Z, then s(Z) > 7 x r. Indeed, s(Z) > js(X) > 7 x r; hence, if no such 
Z is found in the frequent closures lattice at support threshold 7 x r, no such Z exists at 
all. □ 



4.6. Empirical Evaluation. Whereas our interests in this paper are rather foundational, 
we wish to describe briefly the direct applicability of our results so far. We have chosen 
an approach that conveniently uses as a black-box a separate closed itemsets miner due 
to Borgelt [8]. We have implemented a construction of the GD basis using a hypergraph 
transversal method to construct representative rules of confidence 1 following the guidelines 
of |37j and subsequently simplifying them to obtain the GD basis as per [3]; and we have 
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Dataset 
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431838 


86902 


582 


214 


4054 


4268 



Table 1: Number of rules in various bases for benchmark datasets. 



implemented a simple algorithm that scans repeatedly the closed sets mined by the separate 
program and constructs all basic 7-antecedents. A first scan picks up 7-antecedents from 
the proper closed subsets and filters them for minimality; once all minimal antecedents are 
there for all closures, a subsequent scan filters out those that are not basic by way of being 
antecedents of larger sets. Effectively the algorithm does not implement the definition but 
the immediate extension of the characterization in Corollary |3. 18 to the closure-based case. 

A natural alternative consists in preprocessing the lattice as a graph in order to find 
the predecessors of a node directly; however, in practice, with this alternative, whenever the 
graph requires too much space, we found that the computation slows down unacceptably, 
probably due to a worse fit to virtual memory caching. Our implementation gives us answers 
in just seconds in most cases, on a mid-range Windows XP laptop, taking a few minutes 
when the closure space reaches a couple dozen thousand itemsets. 

On the basis of this implementation, we have undertaken some empirical evaluations of 
the sizes of the basis. We consider that the key point of our contribution is the mathematical 
proof of absolute size minimality, but, as a mere illustration, we show the figures of some of 
the cases explored in [33] in Table [TJ The datasets and thresholds are set exactly as per that 
reference; column "S/C" is the confidence and support parameters. Columns "Traditional" 
(for the number of rules under the standard traditional definition |2j) and "Closure-based" 
(for the number of rules obtained by the closure-based method proposed in [33]) are taken 
verbatim from the same reference. We have added the number of rules in the representative 
basis for implications at 100% confidence "RRImp", that coincides with the iteration-free 
basis [33] and other proposals as discussed at the beginning of Subsection 4.1, the size of 
the GD basis for the same implications (often yielding huge savings); and the number of 
rules in the B* basis of partial rules, which, in the totality of these cases, did coincide 
with the representative rules at the corresponding thresholds. As discussed in the end of 



Section 4.4 , representative rules encompass implications but B* must be taken jointly with 
the GD basis, so we give also the corresponding sum. 

The confidence chosen in [44J for this comparison, namely, coincident with the support 
threshold, is, in our opinion, too low to provide a good perspective; at these thresholds, rep- 
resentative rules essentially correspond to support bounds (rules with empty left-hand side). 
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To complement the intuition, we provide the evolution of the sizes of the representative rules 
and the B* basis for the dataset pumsb-star, downloaded from [T7j, at the same support 
thresholds of 40% and 60% used in Table [TJ with confidence ranging from 99% to 51%, 
at 1% granularity. The Guigues-Duquenne bases at these support thresholds consist of 48 
and 5 rules respectively. These have been added to the size of B* in Figures [2] and[3j At 
these confidence tresholds, the traditional notion of association rules gives from 105086 up 
to 179684 rules at support 40%, and between 268 and 570 rules at support 60%. Note that, 
in that notion, association rules are restricted, by definition, to singleton consequents; larger 
numbers would be found if this condition is lifted for a fairer comparison with the bases we 
study. These figures show the advantage of the closure-based basis over representative rules 
up to the point where the implications become subsumed by partial representative rules. 

We want to point out as well one interesting aspect of the figures obtained. The 
standard settings for association rules lead to a monotonicity property, by which lower 
confidence thresholds allow for more rules, so that the size of the output grows (sometimes 
enormously) as the confidence threshold decreases. However, in the case of the B* basis 
and the representative rules, some datasets exhibit a nonmonotonic evolution: at lesser 
confidence thresholds, sometimes less rules are obtained. Inspecting the actual rules, we 
can find the reason: sometimes there are several rules at, say, 90% confidence that become 
simultaneously redundant due to a single rule of smaller confidence, say 85%, which does not 
appear at 90% confidence. This may reduce the set of rules upon lowering the confidence 
threshold. 

5. Towards General Entailment 

We move on towards a further contribution of this paper: we propose a stronger notion 
of redundancy, as progress towards a complete logical approach, where redundancy would 
play the role of entailment and a sound and complete deductive calculus is sought. Con- 
sidering the redundancy notions described so far, the following question naturally arises: 
beyond all these notions of redundancy that relate one partial rule to another partial rule, 
possibly in presence of implications, is it indeed possible that a partial rule is entailed jointly 
by two partial rules, but not by a single one of them? and, if so, when does this happen? 
We will fully answer this question below. 

The failures of Transitivity and Augmentation may suggest the intuition of a negative 
answer: it looks like any combination of two partial rules of confidence at least 7, but 
with 7 < 1, will require us to multiply confidences, reaching as low as 7 2 or lower; but 
this intuition is wrong. We will characterize precisely the case where, at a fixed confidence 
threshold, a partial rule follows from exactly two partial rules, a case where our previous 
calculus becomes incomplete; and we will identify one extra deduction scheme that allows 
us to conclude as consequent a partial rule from two premise partial rules in a sound form. 
The calculus obtained is complete with respect to entailment from two premise rules. We 
present the whole setting in terms of closure-based redundancy, but the development carries 
over for plain redundancy, simply by taking the identity as closure operator. 

A first consideration is that we no longer have a single value of the confidence to 
compare; therefore, we take a position like the one in most cases of applications of association 
rule mining in practice, namely: we fix a confidence threshold, and consider only rules 
whose confidence is above it. An alternative view, further removed from practice, would be 



REDUNDANCY AND BASES FOR ASSOCIATION RULES 



25 



2200 



2000 - 



1800 



11500 



1400 - 



1200 - 



leeo 



see - 



600 



400 



RR basis size 

GD+B+ basis size — l- 




1B0 



Figure 2: Basis sizes per confidence in pumsb-star at 40% support 



to require just that the confidence of all our conclusions should be at least the same as the 
minimum of the confidences of the premises. 

As an example, consider the following fact (the analogous statement for 7 < 1/2 does 
not hold, as discussed below): 

Proposition 5.1. Let 7 > 1/2. Assume that items A, B, C, D are present inlA and that 
the confidence of the rules A — > BC and A — > BD is above 7 in dataset D. Then, the 
confidence of the rule ACD — >• B inD is also above 7. 

We do not provide a formal proof of this claim since it is just the simplest particular 
case of Theorem |5.3| below. We consider the following definition: 



Definition 5.2. Given a set B of implications, and a set 1Z of partial rules, rule Xq — > Yq is 
7-redundant with respect to them (or also 7-entailed by them), denoted B,1Z |= 7 Xq — > Yq, 
if every dataset in which the rules of B have confidence 1 and the confidence of all the rules 
in 7Z is at least 7 must satisfy as well Xq — > Yq with confidence at least 7. The entailment 
is called "proper" if it does not hold for proper subsets of 1Z; otherwise it is "improper" . 

Note that, in this case, the parameter 7 is necessary to qualify the entailment relation 
itself. In previous sections we had a mere confidence inequality that did not depend on 7. 
The main result of this section is now: 
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Figure 3: Basis sizes per confidence in pumsb-star at 60% support 

Theorem 5.3. Let B be a set of implications, and let 1/2 < 7 < 1. Consider three partial 
rules, X -> Y , X 1 -> Fi, and X 2 -»■ F 2 . ITien, B, {Xi -»■ >i, X 2 -»■ Y 2 } |= 7 X -> F if 
and only if either: 

(1) Y C Xo, or 

(2) |=X ^y 0; or 

(3) B, {X 2 -> y 2 } |= X -> y , or 

(4) a// i/ie following conditions simultaneously hold: 

(i) I1CI0 

(ii) x 2 c 

(iii) Xi C X2Y2 

(iv) X 2 C XiYi 

(v) X C X&XzYz 

(vi) y c 

(vii) y c x y 2 

Proof. Let us discuss first the leftwards implication. In case (1), rule Xq — > Yq holds trivially. 
Clearly cases (2) and (3) also give (improper) entailment. For case (4), we must argue that, 
if all the seven conditions hold, then the entailment relationship also holds. Thus, fix any 
dataset T> where the confidences of the premise rules are at least 7: these assumptions can 
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be written, respectively, s(X±Yi) > r ys(Xi) and s(X2Y 2 ) > 7^X2), or equivalently for the 
corresponding closures. 

We have to show that the confidence of Xq — > Yq in T> is also at least 7. Consider the 
following four sets of transactions from T>: 



A = 


{t G V\ 


t \= x Y } 


B = 


{tev 


\t\=x ,ty=x Y } 


C = 


{teV 


t ^XiY^t^Xo} 


D = 


{tev 


\t\=X 2 Y 2 ,t^X } 



and let a, b, c, and d be the respective cardinalities. 

We first argue that all four sets are mutually disjoint. 

This is easy for most pairs: clearly A and B have incompatible behavior with respect 
to Yq; and a tuple in either A or B has to satisfy Xq, which makes it impossible that 
that tuple is accounted for in either C or D. The only place where we have to argue a 
bit more carefully is to see that C and D are disjoint as well: but a tuple t that satisfies 
both X\Y\ and X2Y2, that is, satisfies their union X1Y1X2Y2, must satisfy every subset of 
the corresponding closure as well, such as Xo, due to condition (v). Hence, C and D are 
disjoint. 

Now we bound the supports of the involved itemsets as follows: clearly, by definition 
of A, s(XoYq) = a. All tuples that satisfy Xo are accounted for either as satisfying Yq as 
well, in A, or in B in case they don't; disjointness then guarantees that s(Xq) = a + b. 

We see also that s(X±) > a + b + c + d, because X\ is satisfied by the tuples in C, by 
definition; by the tuples in A or B, by condition (i); and by the tuples in D, by condition 
(iii); again disjointness allows us to sum all four cardinalities. Similarly, using instead (ii) 
and (iv), we obtain s(X2) >a + b + c + d. 

The next delicate point is to show an upper bound on s(X\Yi) (and on s(X 2 Y 2 ) sym- 
metrically). We split all the tuples that satisfy X{Y\ into two sets, those that additionally 
satisfy Xo, and those that don't. Tuples that satisfy X\Y\ and not Xo are exactly those 
in C, and there are exactly c many of them. Satisfying X{Y\ and Xo is the same as sat- 
isfying XoYi by condition (i), and tuples that do it must also satisfy Yq by condition (vi). 
Therefore, they satisfy both Xo and Yq, must belong to A, and there can be at most a 
many of them. That is, s(XiYi) < a + c and, symmetrically, resorting to (ii) and (vii), 
s(X 2 Y 2 ) <a + d. 

Thus we can write the following inequations: 

a + c > s(XiFi) > 7s(Xi) > 7(0 + b + c + d) 

a + d> s(X 2 Y 2 ) > 7*(X 2 ) > l(a + b + c + d) 
Adding them up, using 7 > |, we get 

2a + c + d > 27(0 + b + c + d) = 2j(a + b) + 2^{c + d) > 27(0 + b) + c + d 

that is, a > 7(0 + 6), so that 

(Y ,V\ S ( X Y 0) a ^ 

C X Yq) = = —— > 7 

s(Xq) a + b 

as was to be shown. 

Now we prove the rightwards direction; the bound 7 > \ is not necessary for this part. 
Since all our supports are integers, we can assume that the threshold is a rational number, 
7 = ^, so that we can count on n — m > and 1 < m < n — 1. We will argue the 
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contrapositive, assuming that we are in neither of the four cases, and showing that the 
entailment does not happen, that is, it is possible to construct a counterexample dataset for 
which all the implications in B hold, and the two premise partial rules have confidence at 
least 7, whereas the rule in the conclusion has confidence strictly below 7. This requires us 
to construct a number of counterexamples through a somewhat long case analysis. In all of 
them, all the tuples will be closed sets with respect to B; this ensures that these implications 
are satisfied in all the transactions. We therefore assume that case (1) does not hap pen, 
that is, Yq $2 Xq; and that cases (2) and (3) do not happen either. Now, Theorem 



4.5 



tells 

us that X\ C Xq implies XqYq <2 X\Y\, and that X2 C Xq implies XqYq % X2Y2. Along the 
rest of the proof, we will refer to the properties explained in this paragraph as the "known 
facts". 

Then, assuming that case (4) does not hold either, we have to consider multiple ways 
for the conditions (i) to (vii) to fail. Failures of (i) and (ii), however, cannot be argued 
separately, and we discuss them together. 

Case A. Exactly one of (i) and (ii) fails. By symmetry, renaming X\ — > Y\ into X2 — > Y2 if 
necessary, we can assume that (i) fails and (ii) holds. Thus, X\ ^ Xq but X2 Q Xq. Then, 
by the known facts, XqYq ^ X2Y2. We consider a dataset consisting of one transaction with 
the itemset X2Y2, mn—1 transactions with the set X0X1Y1X2Y2, and n(n — m) transactions 
with the set Xq, for a total of n 2 transactions. Then, the support of Xq is either n 2 — 1 or n 2 , 
and the support of XqYq is at most mn — 1, for a confidence bounded by ^rEr < 7{? = 7 
for the rule Xq —> Yq. However, the premise rules hold: since (i) fails, the support of 
X\ is at most mn, and the support of X\Y\ is at least mn — 1, for a confidence at least 
mn ~ 1 > ^ = j for Xi — > Y\] whereas the support of X2 is re 2 , that of X2Y2 is at least rem, 
and therefore the confidence is at least m/n = 7. 

Case B. This corresponds to both of (i) and (ii) failing. Then, for a dataset consisting 
only of Xq's, the premise rules hold vacuously whereas Xq — > Yq fails. We can also avoid 
arguing through rules holding vacuously by means of a dataset consisting of one transaction 
XQX1Y1X2Y2 and re transactions Xq. 

Remark. For the rest of the cases, we will assume that both of (i) and (ii) hold, since 
the other situations are already covered. Then, by the known facts, we can freely use the 
properties X Y % X1Y1 and XqYq % X 2 Y 2 . 

Case C. Assume (iii) fails, X\ ^ X2Y2, and consider a dataset consisting of one transaction 
Xq, n transactions X{Y\, and n 2 transactions X2Y2. Here, by the known facts, the support 
of XqYq is zero. It suffices to check that the antecedent rules hold. Since (iii) fails, and 
(i) holds, the support of X\ is exactly re + 1 and the support of X{Y\ is at least re, for a 
confidence of at least > > 7^ = 7; whereas the support of X2 is at most n 2 + n + 1 

2 

(depending on whether (iv) holds) for a confidence of rule X2 — > Y2 of at least n2 " n+1 which 
is easily seen to be above > ^ = 7- 

The case where (iv) fails is fully symmetrical and can be argued just interchanging the 
roles of X\ — > Y\ and X2 — > Y2 ■ 

Case D. Assume (v) fails. It suffices to consider a dataset with one transaction Xq and n — 1 
transactions XxYxXzYz- Using (i) and (ii), for both premises the confidence is > 7, 
the support of Xq is 1, and the support of XqYq is zero by the known fact Yq ^ Xq and the 
failure of (v). 
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Case E. We assume that (vi) fails, but a symmetric argument takes care of the case where 
(vii) fails. Thus, we have Yq % XqY\. By treating this case last, we can assume (i), (ii), 
and (v) hold, and also the known facts that XqYq % X{Y\ and XqYq % X2Y2. We consider 
a dataset with one transaction XqY\, one transaction X2Y2, m — 1 transactions X1Y1X2Y2, 
and n — m — 1 transactions Xq (note that this last part may be empty, but n — m — 1 > 0; 
the total is n transactions). By (v), the support of Xq is at least n — 1, whereas the support 
of XqYq is at most m — 1, given the available facts. Since < 7, rule Xq — > Yq does 
not hold. However, the premises hold: all supports are at most n, the total size, and the 
supports of X{Y\ (using (i)) and X2Y2 are both m. 

This completes the proof. □ 

A small point that remains to be clarified is the role of the condition 7 > 1/2. As 
indicated in the proof of the theorem, that condition is only necessary in one of the two 
directions. If there is entailment, the conditions enumerated must hold irrespective of the 
value of 7. In fact, for < 7 < 1/2, proper entailment from a set of two (or more) premises 
never holds, and 7-entailment in general is characterized as (closure-based) redundancy as 



per Theorem 4.5 and the corresponding calculus. Indeed: 

Theorem 5.4. Let < 7 < 1/2. Then, B, {X 1 Y u X 2 -> Y 2 } |= 7 X Yq if and only 
if either: 

(1) Yq C Xq, or 

(2) B,{X 1 ^11} \=X ^Yq, or 

(3) B,{X 2 ^Y 2 } \=X ^Yq. 



Proof. The leftwards proof is already part of Theorem |5.3| For the converse, assume that the 
three conditions fail: similarly to the previous proof, we have as known facts the following: 
Yq % X~q, Xi C ~Xq implies XqYq % X X Y X and X 2 Q Xq~ implies X Y % X 2 Y 2 . We prove 
that there are datasets giving low confidence to Xq — > Yq and high confidence to both 
premise rules. If both X\ % Xq and X2 % Xq then we consider one transaction X\Y\, 
one transaction X2Y2, and a large number m of transactions Xq which do not change the 
confidences of the premises but lead to a confidence of at most 2/m for Xq — > Yq. Also, if 
X\ % Xq but X2 C Xq, where the symmetric case is handled analogously, we are exactly as 



in Case A in the proof of Theorem 5.3 and argue in exactly the same way. 

The interesting case is when both X\ C Xq and X2 C Xq; then both XqYq ^ X\Y\ and 
Ao^o % X2Y2. We fix any integer k > l J 2 ^ and use the fact that 7 < 1/2 to ensure that 
the fraction is positive and that the inequality can be transformed, by solving for 7, into 
2k + i > 7 (following these steps for 7 > 1/2 either makes the denominator null or reverses 
the inequality due to a negative sign). We consider a dataset with one transaction for Xq 
and k transactions for each of X\Y\ and X2Y2. Even in the worst case that either or both 
of Ai and A2 show up in all transactions, the confidences of Ai — > Y\ and A2 —> I2 are at 
least 2^jt[ > 7, whereas the confidence of Xq — > Yq is zero. □ 



5.1. Extending the calculus. We work now towards a rule form, in order to enlarge our 
calculus with entailment from larger sets of premises. We propose the following additional 
rule: 

(2A) Xl ^ Yl ' X2 ^ y2 ' x i y i^ x 2, X 2 Y 2 ^X 1 , X 1 Y 1 X 2 Y 2 ^Z 1 , X 1 Y 1 Z 1 ^Z 2 , X 2 Y 2 Z 1 ^Z 2 

X\X 2 Z\^tZ 2 
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and state the following properties: 

Theorem 5.5. Given a threshold 7 > 1/2 and a set B of implications, 
(1) this deduction scheme is sound, and 



(2) together with the deduction schemes in Section it gives a calculus complete with 



respect to all entailments with two partial rules in the antecedent. 



Proof. This follows easily from Theorem 5.3 , in that it implements the conditions of case (4); 



soundness is seen by directly checking that the conditions (i) to (vii) in case 4 of Theorem 5.3 
hold: let Xq = X1X2Z1 and Yq = Z2; then, conditions (i) and (ii) hold trivially, and the rest 
are explicitly required in the form of implications in the premises (notice that X{Y\ X2 
implies that X{Y\Z\ Z2 and X\X2Y\Z\ Z2 are equivalent). Completeness is argued 
by considering any rule Xq — > Yq entailed by X\ — > Y\ and X2 — > Y% jointly with respect 



to confidence threshold 7; if the entailment is improper, apply Theorem 4.6, otherwise just 
apply this new deduction scheme with Z\ = Xq and Z2 = Yq to get Xq —> Yq and apply (£1) 
to obtain Xq —>Yq. It is easy to see that the scheme is indeed applicable: proper entailment 
implies that all seven conditions in case (4) hold and, for Z\ = Xq, we get from (i) and (ii) 
that X1X2Z1 = Z\\ under this equality, the remaining five conditions provide exactly the 
premises of the new deduction scheme. □ 



6. Discussion 

Our main contribution, at a glance, is a study of confidence-bounded association rules in 
terms of a family of notions of redundancy. We have provided characterizations of several 
existing redundancy notions; we have described how these previous proposals, once the 
relationship to the most robust definitions has been clarified, provide a sound and complete 
deductive calculus for each of them; and we have been able to prove global optimality of 
an existing basis proposal, for the plain notion of redundancy, and also to improve the 
constructions of bases for closure-based redundancy, up to global optimality as well. 

Many existing notions of redundancy discuss redundancy of a partial rule only with 
respect to another single partial rule; in our Section [5j we have moved beyond into the use 
of two partial rules. For this approach to redundancy, we believe that this last step has 
been undertaken for the first time here; the only other reference we are aware of, where 
a consideration is made of several partial rules entailing a partial rule, is the early |33] , 
which used a much more demanding notion of redundancy in which the exact values of the 
confidence of the rules were both available on the premises and required in the conclusion. 
In our simpler context, we have shown that the following holds: for < 7 < 1/2, there is no 
case of proper 7-entailment from two premises; beyond 1/2, there are such cases, and they 
are fully captured in terms of set inclusion relationships between the itemsets involved. We 
conjecture that a more general pattern holds. 

More precisely, we conjecture the following: for values of the confidence parameter 
such that < 7 < (where n > 1), there are partial rules that are properly 
entailed from n premises, partial rules themselves, but there are no proper entailments 
from n + 1 or more premises. That is, intuitively, higher values of the confidence threshold 
correspond, successively, to the ability of using more and more partial premises. However, 
the combinatorics to fully characterize the case of two premises are already difficult enough 
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for the current state of the art, and progress towards proving this conjecture requires to 
build intuition to much further a degree. 

This may be, in fact, a way towards stronger redundancy notions and always smaller 
bases of association rules. We wish to be able to establish such more general methods to 
reach absolutely minimum-size bases with respect to general entailment, possibly depending 
on the value of the confidence threshold 7 as per our conjecture as just stated. 

We observe the following: after constructing a basis, be it either the representative rules 
or the B* family, it is a simple matter to scan it and check for the existence of pairs of rules 



that generate a third rule in the basis according to Theorem 5.3: then, removing such third 



rules gives a smaller basis with respect to this more general entailment. However, we must 
say that some preliminary empirical tests suggest that this sort of entailments from two 
premises seems to appear in practice very infrequently, so that the check is computationally 
somewhat expensive compared to the scarce savings it provides for the basis size. 

Now that all our contributions are in place, let us review briefly a point that we made in 
the Introduction regarding what is expected to be the role of the basis. The statement that 
association rule mining produces huge outputs, and that this is indeed a problem, not only 
is acknowledged in many papers but also becomes self-evident to anyone who has looked 
at the output of any of the association miner implementations freely accessible on the web 
(say [8] for one). However, we do not agree that it is one problem: to us, it is, in fact, two 
slightly different problems, and confusing them may lead to controversies that are easier to 
settle if we understand that different persons may be interested in different problems, even 
if they are stated similarly. Specifically, let us ask whether a huge output of an association 
miner is a problem for the user, who needs to receive the output of the mining process in a 
form that a human can afford to read and understand, or for the software that is to store 
all these rules, with their supports and confidences. Of course, the answer is "both", but 
the solutions may not coincide. 

Indeed, sophisticated conceptual advances have provided data structures to be com- 
puted from the given dataset in such a way that, within reasonable computational resource 
limits, they are able to give us the support and confidence of any given rule in the given 
dataset; maybe a good approximation is satisfactory enough, and this may allow us to ob- 
tain some efficiency advantages. The set of frequent sets, the set of frequent closures, and 
many other methods have been proposed for this task; see [3], [9], [10], [13], [33], [35], [36] . 
[33], and the surveys [11] and [29] . 

Our approach is, rather, logical in nature, and aimed at the other variant of the prob- 
lem: what rules are irredundant, in a general sense. From these, redundant rules reaching 
the thresholds can be found, "just as rules". So, we formalize a situation closer to the 
practitioner's process, where a confidence threshold 7 is enforced beforehand and the rules 
with confidence at least 7 are to be discussed; but we do not need to infer from the basis 
the value of the confidence of each of these other rules, because we can recompute it imme- 
diately as a quotient of two supports, found in an additional data structure that we assume 
kept, such as the closures lattice with the supports of each closed set. 

Therefore, our bases, namely, the already-known representative rules and our new 
closure-based proposal £>*, are rather "user-oriented": we know that all rules above the 
threshold can be obtained from the basis, and we know how to infer them when necessary; 
thus, we could, conceivably, guide (or be guided by) the user if (s)he wishes to see all the 
rules that can be derived from one of the rules in the basis; this user-guided exploration of 
the rules resulting from the mining process is alike to the "direction-setting rules" of |31| . 
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with the difference that their proposai is based on statistical considerations rather than the 
logic-based approach we have followed. 

The advantage is that our basis is not required to provide as much information as the 
bases we have mentioned so far, because the notion of redundancy does not require us to be 
able to compute the confidence of the redundant rules. This is why we can reach an optimum 
size, and indeed, compared to [36] or [H], B* differs because these proposals, essentially, 
pick all minimal generators of each antecedent, which we avoid. The difference is marginal 
in the conceptual sense; however the figures in practical cases may differ considerably, and 
the main advantage of our construction is that we can actually prove that there is no better 
alternative as a basis for the partial rules with respect to closure-based redundancy. 

Further research may proceed along several questions. We believe that a major break- 
through in intuition is necessary to fully understand entailment among partial rules in its 
full generality, either as per our conjecture above or against it; variations of our definition 
may be worth study as well, such as removing the separate confidence parameter and re- 
quiring that the conclusion holds with a confidence at least equal to the minimum of the 
confidences of the premises. 

Other questions are how to extend this approach to the mining of more complex depen- 
dencies [H] or of dependencies among structured objects; however, extending the develop- 
ment to sequences, partial orders, and trees, is not fully trivial, because, as demonstrated 
in [7], there are settings where the combinatorial structures may make redundant certain 
rules that would not be redundant in a propositional (item-based) framework; addition- 
ally, an intriguing question is: what part of all this discussion remains true if implication 
intensity measures different from confidence ([20], [21]) are used? 
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